fix: agent reconciles pipeline directory on every poll#10
Conversation
After each successful poll, scan the pipelines directory and remove any YAML files (and their .vf-metrics.yaml sidecars) not present in the server response. This prevents stale configs from accumulating after node re-enrollment into a different environment or missed undeploys. Also extracts a configFetcher interface from the poller to enable unit testing without an HTTP server.
Greptile SummaryThis PR fixes orphaned pipeline YAML files persisting on disk after a node is re-enrolled into a different environment by adding a disk-reconciliation pass at the end of
Confidence Score: 3/5
|
agent/internal/agent/poller.go
Outdated
| entries, readErr := os.ReadDir(pipelinesDir) | ||
| if readErr == nil { |
There was a problem hiding this comment.
Silent reconciliation failure
If os.ReadDir fails (e.g., a transient permission or I/O error), the entire reconciliation is silently skipped with no log message. This makes the failure mode invisible — orphaned files would accumulate with no diagnostic, and the only clue would be the missing "removing orphaned pipeline config" log lines.
Consider logging the error before bailing:
| entries, readErr := os.ReadDir(pipelinesDir) | |
| if readErr == nil { | |
| entries, readErr := os.ReadDir(pipelinesDir) | |
| if readErr != nil { | |
| slog.Warn("failed to read pipelines dir for reconciliation", "error", readErr) | |
| } else { |
| slog.Warn("removing orphaned pipeline config", "pipelineId", id) | ||
| os.Remove(filepath.Join(pipelinesDir, name)) | ||
| os.Remove(filepath.Join(pipelinesDir, name+".vf-metrics.yaml")) | ||
| } |
There was a problem hiding this comment.
Unlogged os.Remove errors in reconciliation block
Both os.Remove calls discard their return values silently, even though a slog.Warn is already emitted a line above. Compare this to the supervisor's equivalent cleanup at supervisor.go:205–206, which logs a warning when the remove fails. If an orphaned file can't be deleted (e.g., locked by another process or a permission issue), there's no indication of the failure.
| slog.Warn("removing orphaned pipeline config", "pipelineId", id) | |
| os.Remove(filepath.Join(pipelinesDir, name)) | |
| os.Remove(filepath.Join(pipelinesDir, name+".vf-metrics.yaml")) | |
| } | |
| slog.Warn("removing orphaned pipeline config", "pipelineId", id) | |
| if err := os.Remove(filepath.Join(pipelinesDir, name)); err != nil && !os.IsNotExist(err) { | |
| slog.Warn("failed to remove orphaned pipeline config", "path", name, "error", err) | |
| } | |
| if err := os.Remove(filepath.Join(pipelinesDir, name+".vf-metrics.yaml")); err != nil && !os.IsNotExist(err) { | |
| slog.Warn("failed to remove orphaned metrics sidecar", "path", name+".vf-metrics.yaml", "error", err) | |
| } |
Log ReadDir errors during reconciliation instead of silently skipping. Log os.Remove failures for orphaned files, consistent with supervisor cleanup pattern. Guard with !os.IsNotExist to avoid noisy logs.
Summary
pipelines/directory and removes any YAML files not present in the server response.vf-metrics.yamlsidecar files for orphaned pipelinesconfigFetcherinterface to enable unit testing the poller without HTTPProblem
When a node was deleted and re-enrolled into a different environment, pipeline YAML files from the previous environment remained on disk. The poller's in-memory
knownmap started empty on re-enrollment, so it had no awareness of pre-existing files. This caused the agent to run pipelines from both the old and new environments.Changes
agent/internal/agent/poller.go— added directory reconciliation block +configFetcherinterfaceagent/internal/agent/poller_test.go— two tests covering orphan removal and empty-response cleanupTest plan
TestPoll_RemovesOrphanedPipelineFiles— orphans deleted, valid pipelines and non-YAML files preservedTestPoll_EmptyResponseCleansAllFiles— empty server response removes all YAML files